Approximate pattern matching with k-mismatches in packed text

نویسندگان

  • Emanuele Giaquinta
  • Szymon Grabowski
  • Kimmo Fredriksson
چکیده

Given strings P of length m and T of length n over an alphabet of size σ, the string matching with k-mismatches problem is to find the positions of all the substrings in T that are at Hamming distance at most k from P . If T can be read only one character at the time the best known bounds are O(n √ k log k) and O(n+ n √ k/w log k) in the word-RAM model with word length w. In the RAM models (including AC and word-RAM) it is possible to read up to ⌊w/ log σ⌋ characters in constant time if the characters of T are encoded using ⌈log σ⌉ bits. The only solution for k-mismatches in packed text works in O((n log σ/ log n)⌈m log(k+log n/ log σ)/w⌉+n) time, for any ε > 0. We present an algorithm that runs in time O( n ⌊w/(m log σ)⌋ (1 + logmin(k, σ) logm/ log σ)) in the AC model if m = O(w/ log σ) and T is given packed. We also describe a simpler variant that runs in time O( n ⌊w/(m log σ)⌋ logmin(m, logw/ log σ)) in the word-RAMmodel. The algorithms improve the existing bound for w = Ω(log n), for any ǫ > 0. Based on the introduced technique, we present algorithms for several other approximate matching problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches

This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with kmismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in comp...

متن کامل

On string matching with k mismatches

In this paper we consider several variants of the pattern matching problem. In particular, we investigate the following problems: 1) Pattern matching with k mismatches; 2) Approximate counting of mismatches; and 3) Pattern matching with mismatches. The distance metric used is the Hamming distance. We present some novel algorithms and techniques for solving these problems. Both deterministic and...

متن کامل

Exact and Approximate Two Dimensional Pattern Matching allowing Rotations

We give fast ltering algorithms for searching a 2{dimensional pattern in a 2{dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate matching under several matching models, improving the previous results. For a text of size n n character and a pattern of size m m characters, the exact matching takes average time O(n 2 =m). If we allow k{mismatches o...

متن کامل

Approximate Boyer-Moore String Matching

The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the pro...

متن کامل

Approximate String Matching by Finite Automata

Abs t r ac t . Approximate string matching is a sequential problem and therefore it is possible to solve it using finite automata. A nondeterministic finite automaton is constructed for string matching with k mismatches. It is shown, how "dynamic programming" and "shift-and" based algorithms simulate this nondeterministic finite automaton. The corresponding deterministic finite automaton have O...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Lett.

دوره 113  شماره 

صفحات  -

تاریخ انتشار 2013